Hierarchical Deep Recurrent Architecture for Video Understanding

نویسندگان

  • Luming Tang
  • Boyang Deng
  • Haiyu Zhao
  • Shuai Yi
چکیده

This paper 1 introduces the system we developed for the Youtube-8M Video Understanding Challenge, in which a large-scale benchmark dataset [1] was used for multilabel video classification. The proposed framework contains hierarchical deep architecture, including the framelevel sequence modeling part and the video-level classification part. In the frame-level sequence modelling part, we explore a set of methods including Pooling-LSTM (PLSTM), Hierarchical-LSTM (HLSTM), Random-LSTM (RLSTM) in order to address the problem of large amount of frames in a video. We also introduce two attention pooling methods, single attention pooling (ATT) and multiply attention pooling (Multi-ATT) so that we can pay more attention to the informative frames in a video and ignore the useless frames. In the video-level classification part, two methods are proposed to increase the classification performance, i.e. Hierarchical-Mixture-of-Experts (HMoE) and Classifier Chains (CC). Our final submission is an ensemble consisting of 18 sub-models. In terms of the official evaluation metric Global Average Precision (GAP) at 20, our best submission achieves 0.84346 on the public 50% of test dataset and 0.84333 on the private 50% of test data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Video Captioning using Deep Neural Network

Video understanding has become increasingly important as surveillance, social, and informational videos weave themselves into our everyday lives. Video captioning offers a simple way to summarize, index, and search the data. Most video captioning models utilize a video encoder and captioning decoder framework. Hierarchical encoders can abstractly capture clip level temporal features to represen...

متن کامل

An Efficient Hierarchical Modulation based Orthogonal Frequency Division Multiplexing Transmission Scheme for Digital Video Broadcasting

Due to the increase of users the efficient usage of spectrum plays an important role in digital terrestrial television networks. In digital video broadcasting, local and global content are transmitted by single frequency network and multifrequency network respectively. Multifrequency network support transmission of global content and it consumes large spectrum. Similarly local content are well ...

متن کامل

Deep Reservoir Computing: A Critical Analysis

In this paper we propose an empirical analysis of deep recurrent neural networks (RNNs) with stacked layers. The analysis aims at the study and proposal of approaches to develop and enhance multiple timescale and hierarchical dynamics in deep recurrent architectures, within the efficient Reservoir Computing (RC) approach for RNN modeling. Results point out the actual relevance of layering and R...

متن کامل

Pillar Networks for action recognition

Image understanding using deep convolutional network has reached human-level performance, yet a closely related problem of video understanding especially, action recognition has not reached the requisite level of maturity. We combine multi-kernels based support-vector-machines (SVM) with a multi-stream deep convolutional neural network to achieve close to state-of-the-art performance on a 51-cl...

متن کامل

A deep architecture for audio-visual voice activity detection in the presence of transients

We address the problem of voice activity detection in difficult acoustic environments including high levels of noise and transients, which are common in real life scenarios. We consider a multimodal setting, in which the speech signal is captured by a microphone, and a video camera is pointed at the face of the desired speaker. Accordingly, speech detection translates to the question of how to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1707.03296  شماره 

صفحات  -

تاریخ انتشار 2017